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SEMIPARAMETRIC DETECTION OF SIGNIFICANT ACTIVATION 
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Functional magnetic resonance imaging (fMRI) aims to locate 
activated regions in human brains when specific tasks are performed. 
The conventional tool for analyzing fMRI data applies some variant 
of the linear model, which is restrictive in modeling assumptions. 
To yield more accurate prediction of the time-course behavior of 
neuronal responses, the semiparametric inference for the underlying 
hemodynamic response function is developed to identify significantly 
activated voxels. Under mild regularity conditions, we demonstrate 
that a class of the proposed semiparametric test statistics, based on 
the local linear estimation technique, follow y 2 distributions under 
null hypotheses for a number of useful hypotheses. Furthermore, the 
asymptotic power functions of the constructed tests are derived un- 
der the fixed and contiguous alternatives. Simulation evaluations and 
real fMRI data application suggest that the semiparametric inference 
procedure provides more efficient detection of activated brain areas 
than the popular imaging analysis tools AFNI and FSL. 

1. Introduction. Neuroscience is a discipline dedicated to studying the 
structure, function and pathology of the brain and nervous system, and lies 
at the forefront of investigation of the brain and mind. Functional magnetic 
resonance imaging (fMRI) has emerged as a new and exciting noninvasive 
imaging technique that aims to localize functional brain areas in a living 
human brain, that is, to detect areas or regions that are responsible for the 
processing of certain stimuli. 
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Adequate statistical modeling and analysis of the massive spatio-temporal 
data sets generated by fMRI pose significant challenges to conventional sta- 
tistical methods. First, a typical fMRI data set for a single scan on a single 
subject contains a (temporally) highly correlated time series of measure- 
ments taken every two seconds or so for about an hour on each of, say, 
64 x 64 x 30 voxels (a voxel is a volume element in three-dimensional space) 
throughout the brain. Accordingly, the data sets are so enormous that proper 
accomodation of both temporal and spatial correlation is needed. Second, 
models relating fMRI signals to neural changes are complex. The standard 
tool for analyzing fMRI data is some variant of the linear model, usually fit- 
ted separately by least-squares to each voxel [Worsley and Friston (1995)]. 
After that, tests of significance of the model parameters are performed and 
colors are drawn on top of significant voxels. This comprises the major proce- 
dure of statistical parametric mapping (SPM), popularly used in neuroimage 
study [Friston et al. (1997)]. Recent reviews of the statistical issues involved 
in fMRI for brain imaging and the statistical methods for analyzing fMRI 
data can be found in Lange (1996), Lazar et al. (2001), Fahrmeir and Gdssl 
(2002) and Worsley, et al. (2002), among others. 

In this paper, we aim to develop voxelwise semiparametric inference for 
the underlying hemodynamic response function (HRF), the object of pri- 
mary interest to neuroscientists. For instance, identifying whether a partic- 
ular voxel is activated when a subject performs certain motor, sensory or 
cognitive tasks can be achieved by means of a statistical test of the hypoth- 
esis that HRF is zero. In order to generate brain activation maps, statistical 
inference must be drawn from voxelwise estimates of HRF. We will first de- 
velop a semiparametric modeling and estimation approach to obtain statis- 
tically more efficient estimates of the underlying HRF associated with fMRI 
experiments. Compared with the general linear model approach in previous 
studies, our approach has the advantage that we neither specify any a priori 
parametric shape for the HRF, nor do we assume any particular form for 
the temporal drift function. Taking full advantage of these flexibilities will 
help to reduce the bias due to model misspecification and to enhance the 
power of detection. 

Addressing the issue of semiparametric inference for brain fMRI is a non- 
trivial task, however. Existing parametric statistical inference procedures for 
fMRI are not immediately applicable to our approach in which the HRFs are 
estimated semiparametrically. The work on the generalized likelihood ratio 
test [Fan, Zhang and Zhang (2001)] sheds light on nonparametric inference, 
based on function estimation under nonparametric models with indepen- 
dent errors, and, at the same time, is not readily translated into results 
from other models. Moreover, as emphasized in Section 3, some standard 
results for semiparametric models are not directly applicable to the context 
of fMRI data due to the distinctive feature of the Toeplitz design matrix 
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and the complicated dependence structure of the error process. Hence, a 
rigorous investigation of semiparametric inference applied to the important 
area of fMRI research is required. This paper fills that gap in the literature. 
Under mild regularity conditions, we show that a class of the proposed semi- 
parametric test statistics follow x 2 distributions under null hypotheses for a 
number of useful hypotheses. To yield improved finite-sample performance 
of the proposed test statistic, we further explore its bias-corrected version 
and derive the corresponding asymptotic distribution. Moreover, the asymp- 
totic power functions of the constructed tests are derived under the fixed 
and contiguous alternatives. These results are not only important for gaining 
theoretical insight into semiparametric inference applied to a much broader 
range of scientific problems, but also helpful in offering valuable practical 
guidance for the implementation of these techniques. 

The rest of the paper is arranged as follows. Section 2 reviews statis- 
tical models for single-voxel and single-run fMRI. Section 3 describes the 
semiparametric estimation of the HRF, based on the local linear nonpar a- 
metric smoothing technique. Section 4 establishes the asymptotic distribu- 
tion of the proposed test statistics. Section 5 presents simulation evaluations 
and compares the activated brain regions using the popular imaging anal- 
ysis tools AFNI (at http://afni.nimh.nih.gov/afni/) [Cox (1996)] and 
FSL (at http : //www . f mrib . ox . ac . uk/f si/) [Smith et al. (2004) and Wool- 
rich et al. (2001)]. Section 6 applies the semiparametric inference to a real 
fMRI data set. Technical conditions and detailed proofs are deferred to the 
Appendix. 

2. Statistical models for single-voxel and single-run fMRI. We begin 
with a brief overview of the convolution model popularly used in fMRI study. 
The BOLD (blood oxygenation level-dependent) signal response to neuronal 
activity is heavily lagged and damped by the hemodynamic response. Fol- 
lowing Ward (2001) and Worsley et al. (2002), a single-voxel fMRI time 
series {s(ti),y(ti)}2 =1 for a given scan and a given subject, can be captured 
by the convolution model, 

(2.1) y{t) = s*h(t) + d(t) + e{t), t = h,...,t n , 

where * denotes the convolution operator, y(t) is the measured noisy fMRI 
signal, s(t) is the external input stimulus at time t [which could be from a 
design either block- or event-related and where s(t) = 1 or indicates the 
presence or absence of a stimulus], h(t) is the hemodynamic response func- 
tion (HRF) at time t after neural activity, d(t) is a slowly drifting baseline 
of time t, and e(t) is a zero-mean error process, consisting of nonneural noise 
(due to respiration and blood flow pulsations through the cardiac cycle) and 
"white noise" (from random/thermal currents in the body and the scanner). 
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Table 1 

HRF, drift and error implemented in AFNI and FSL 





AFNI (tool 3dDeconvolve) 


FSL (tool FEAT) 


hit) 


finite impulse response filter 


difference of two gamma functions, 






which is the canonical form 


d(t) 


quadratic polynomial 


removed in the preprocessing, using 






high-pass temporal filtering (Gaussian- 






weighted LSF straight line fitting) 


e(t) 


i.i.d. 


autocorrelation estimated by Tukey 






tapering of the spectrum of the residuals 



2.1. Existing methods for modeling HRF, drift and error. In neuroimag- 
ing studies, most existing methods model h(-) as the difference of two gamma 
functions or a linear combination of gamma functions, a linear combination 
of a gamma function and its Taylor expansion [Worsley et al. (2002), Lange 
and Zeger (1997), Josephs and Henson (1999)]. Genovese (2000) constructed 
h(-) as a "bell" function with cubic splines. As a nuisance component in (2.1), 
the temporal drift d(-) is usually approximated by a quadratic or higher- 
order polynomial [Worsley et al. (2002)] or polynomial splines [Genovese 
(2000)]. Note that restrictive assumptions on the HRF and drift may pro- 
duce biased estimates of the true hemodynamic responses. Goutte, Nielsen 
and Hansen (2000) estimated h(-) using smooth FIR filters and reported 
that some subtle details of the HRF can be revealed by the filters, but 
not by previous approaches based on gamma functions. The errors e(ij) are 
well known to be temporally autocorrelated. Genovese (2000) assumed in- 
dependent errors for computational convenience. Other assumptions like the 
AR(p) structure, most commonly AR(1), are used in Worsley et al. (2002). 
As an illustration, Table 1 tabulates the HRF, drift and error implemented 
in software AFNI and FSL. 

3. Semiparametric estimation of HRF. Estimating the HRF in (2.1) is 
a deconvolution problem. Ideally, the HRF is a high-dimensional smooth 
function and is nonidentically zero if the voxel responds to the stimuli. We 
will describe a semiparametric method for characterizing properties of the 
hemodynamic response in the presence of unknown smooth drift. Such char- 
acterization is essential for accurate prediction of time-course behavior of 
neuronal responses. 

Typically, the peak value of HRF h(-) is reached after a short delay of the 
stimulus and drops quickly to zero. A typical example of h(-), given in Glover 
(1999), is plotted in Figure 1. Clearly, the region {t:h(t) ^ 0} is sparse in 
its temporal domain. Thus, to obtain statistically efficient estimates of the 
HRF associated with event-related fMRI experiments, the sparsity of the 
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HRF needs to be taken into account. We thus suppose that h(t) = for 
t > t m and focus on estimating the first m values of h(ti), where m is less 
than re, the length of the fMRI time series. Similarly to the regularization 
technique discussed in Bickel and Li (2006), such a qualitative assumption 
aims to obtain well-behaved solutions to overparametrized estimation prob- 
lems and is thus particularly appealing for dimension reduction with high- 
dimensional problems. The semiparametric modeling and inference in this 
paper are applicable to all m < re. Data-driven selection of m can be made 
via a change-point approach or other model-selection criteria. To facilitate 
discussion, we assume that y(-) and s(-) have equal time resolutions of one 
second. Letting y = (y(h), y{t n )) J ', 

" s(0) ••• 

s(t 2 -h) a(0) ••• 

S = ' ' 

s(t m - h) s(t m -t 2 ) •■■ s(0) 

,s(t n -ti) s(t n -t 2 ) ••• s(t n -t m ). 

h = (h(h), h(t m )) T , d = (d(h), . . . , d{t n )) T and e = (e(h),. . . , e{t n )) T , 
model (2.1) can be re-expressed as y = Sh + d + e, where S is a Toeplitz 
matrix. 

In general, for multiple types of stimuli, model (2.1) can be extended to 

be 

(3.1) y(t) = s 1 *h 1 (t) + --- + s r *h r (t) + d(t) + e(t), t = t 1 ,...,t n . 

Corresponding to the jth type of stimulus, denote by Sj(-) the time- varying 
stimulus function, by Sj the n x m Toeplitz design matrix and by hj the 
m x 1 vector of the HRF. Model (3.1) can then be rewritten as 

(3.2) y = Sihi + • • • + S r h r + d + e = Sh + d + e, 

where S = [Si, ... , S r ] and h = [hj , . . . , hJ] T . To accommodate fMRI data 
with multiple runs, we only need to supplement the matrix S by adding the 
Toeplitz design matrix arising from each run. 




Fig. 1. An illustrative plot of HRF h(tj) withn = 80. 
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Model (3.2) is conceivably a semiparametric regression model, with a vec- 
tor h of length rm for parametric components and a vector d of length n 
for nonparametric components. The parametric components (related to the 
unknown HRF) are of primary interest, whereas the nonparametric compo- 
nents (related to the unknown temporal drift) serve as nuisance effects, and 
the noise components e are serially correlated. We wish to emphasize that 
due to the special structure of the design matrix S associated with fMRI 
design, some commonly used assumptions, such as independence between 
rows of a design matrix, fail to hold. In addition, the unobservable true cor- 
relation structure of e is often complicated. Thus some standard results for 
semiparametric models are not directly applicable to the current fMRI data. 

We now describe the semiparametric estimation of both the HRF and 
the nonparametric drift function in (3.2). Let Sd be an n x n local linear 
smoothing matrix associated with the design points {t±, . . . ,t n }, with the 
(i,j)th entry equal to 

(3.3) S d (i,j) = (l,0){X(t l ) T W(t i )X(t i )}^ 1 (l,t j -ti) T K((t, -U)/b)/b, 
where K is a kernel function, b > is a bandwidth parameter, 

1 ti-t' 

X(t) 

and 



1 t n -t 



W(t) = diag{^((t! - t)/b)/b, K((t n - t)/b)/b}; 

see Fan and Gijbels (1996), which provides a comprehensive account of the 
local linear and local polynomial regression techniques. (For expositional 
simplicity, this paper is confined to the local linear method.) Note that 
the matrix Sd carries information about the design points, kernel K and 
bandwidth 6, but does not rely on the configuration of the response variables. 
We refer to Section 2.3 of Zhang (2003) for further discussion of finite-sample 
and asymptotic properties of the smoothing matrix. Notice that smoothing 
the entries of y via the local linear method is equivalent to applying Sd to 
y. We observe from (3.2) that 

(3.4) y = Sh + d + ?, 

where y = (I - S d )y, S = (I - S d )S 2 d = (I - S d )d, e = (I - S d )e and I de- 
notes an identity matrix. Ignoring d, model (3.4) can be regarded as a gen- 
eral linear model. Denote by R the true correlation matrix of e, namely, 
cov(e,e) = a 2 R, with variance a 2 . Let R be an estimate of R. By the 
weighted least-squares method, an estimate of h is produced by 
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which, in turn, supplies estimates of the drift components formed by 

d = 5,(y-Sh). 

4. Semiparametric hypothesis test for HRF. Identification of a partic- 
ular brain region with a specific function has become a central theme in 
neuroscience. In this section, we consider constructing test statistics to test 
whether a particular voxel is activated by the stimuli and whether HRFs 
activated by different types of stimuli really differ. They correspond to test- 
ing the hypotheses Ho : h = versus H\ : h / and Hq : hj 1 = hj 2 versus 
Hi :hj 1 ^ hj 2 , where ji 7^ j2- Under the semiparametric model (3.2), all of 
these testing problems can be formulated in a more general form, 

(4.1) H :Ah = O versus #1 : Ah / 0, 

where A is a full row rank matrix with rank(A) = k. 

An earlier work on developing pseudo-F-type test statistics was empiri- 
cally studied in Lu (2006) and Zhang et al. (2006). There, it was observed 
from QQ plots in simulation studies that under the null hypothesis, em- 
pirical quantiles of the F-type test statistics (in the restrictive case where 
the true R is known and a single type of stimulus in the fMRI experiment 
is presented) could be approximated by quantiles of the ^-distribution. No 
asymptotic exploration of properties of the F-type test statistics was con- 
ducted. 

4.1. Asymptotic null distributions. Motivated by the parametric F-statistic 
in linear regression models and the justification of power comparison [Zhang 
and Dette (2004)] between nonparametric tests for regression curves based 
on kernel smoothing techniques, we first examine the following semipara- 
metric test statistic, represented by 

_ {Ahf{A(^ T R- x S)- 1 A T }- x (A\i) 

1C — — , 

r^i?^ 1 ?/ (n — rm) 

where f = y — Sh. Theorem 4.1 below establishes the asymptotic null dis- 
tribution of K. 

Theorem 4.1. Assume Condition A in the Appendix. Then, under Hq 
in (4-1), where A is a k x rm matrix with rank(A) = k, it follows that IK— 
where — > denotes convergence in distribution. 

Our simulation evaluation in Section 5 demonstrates that the finite sam- 
pling distribution of K is reasonably well approximated by its asymptotic x 2 
distribution, whereas when the noise level decreases, the approximation may 
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become less accurate; see Figure 2 (right panel). Technically, as manifested 
in the proof of Theorem 4.1, the asymptotic \ 2 distribution of K follows 
from the asymptotic normality of h shown in Lemma A. 7, which relies on 
the fact that a term J\ (associated with the drift vector d) is stochastically 
dominated by a term Ji (associated with the error vector e). Practically, in 
finite-sample situations, low noise levels do not necessarily guarantee that J\ 
is stochastically negligible compared with J<i- Consequently, the finite sam- 
pling distributions of h and K may appear biased toward the normal and x 2 
distribution, respectively. In these situations, we adopt the bias-corrected 
version of K, defined as 

w (^h bc ) r {A(s r ^ i s)- i ^}- i (^h bc ) 

r^R l r hc /(n-rm) 

where h bc = h - {S T R^Sy 1 ^ R^d, r bc = f - d, 5 = S d (y - Sh) and 

d = (I — S d )d. Note that as the sequence length n grows, d is negligible, but 
practically adjusts for the bias caused by J\ due to the ignorance of d in 
(3.4). Theorem 4.2 below reveals that K bc and K have the same asymptotic 
null distributions. 

Theorem 4.2. Assume Condition A in the Appendix. Then, under Hq 
in (4-1), where A is a k x rm matrix with rank(A) = k, it follows that 

We now make some remarks concerning the derivations of Theorems 4.1- 

4.2. First, it is tempting to try to show that n _1 S r i?~ 1 S —> M for some pos- 

V 

itive definite matrix M, where — > denotes converges in probability. Nonethe- 
less, for fMRI data, since the n x n correlation matrix R of e is generally 
far more complicated than the diagonal matrix of independent errors, deriv- 
ing an explicit form for M is nearly intractable. To overcome this technical 
difficulty, we have demonstrated that it suffices to verify that R satisfies 

var{n-^J lA (I - S d ) T R- l (l - S d )^ 2 } - 

for all ji, j2 = 1, . . . ,r and all £±,£2 = 1, • • • where £^ is the £th column 
vector of Sj and R fulfills Condition A8 in the Appendix, 

E(\\R^-R^\\lo) = o(l), 

ll-BHoo = maxi<j< n X)?=i \B(i,j)\ denoting the oo-norm of an n x n matrix 
B; see Lemma A. 6 and Corollary A. 2. Thus, the explicit form of M is not 
needed in deriving the asymptotic null distributions of K and K bc . Second, 
Condition A8, together with ||5||2 < { ||i?||i H-BHoo} 1 / 2 [Golub and Van Loan 
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(1996)] and the symmetry of R and R, guarantees that — -R _1 ||2 = 

op(l), which is typically interpreted as the "consistency" of large covariance 
matrix estimators [Bickel and Levina (2008)]. 

Remark 1. In real- world applications, fMRI sequence lengths are not 
very long. For instance, n is 185 for each run in the real fMRI data set 
described in Section 6. This indicates that the "mixing assumptions," com- 
monly made in the asymptotic studies of nonlinear time series [Bosq (1998), 
Fan and Yao (2003)], may not hold for fMRI data. Therefore, the sampling 
properties of K and Kb c are studied using the more realistic error assumption 
A3 of Condition A in the Appendix, which could possibly be weakened. 

Remark 2. Throughout the numerical work in this paper, parametric 
estimation of the error covariance matrix adopts a computationally fast and 
effective scheme developed in Zhang et al. (2006), which assumes g = 2 in 
Condition A3 of the Appendix. This regularized estimator is constructed 
as follows: obtain the transformed data e(i,) by applying the second-order 
difference to y(U) — J2 T j=i s j * hj(U); calculate autocovariances {"fe(j)}j=o 
of e(ti), which form a linear system for autocovariances {"f(j)}j = Q of e(ij); 
substitute for {"f e (j)} their empirical moment estimates and solve {7(7)}; ac- 
quire an estimate R of R using Condition A3. Moreover, since an fMRI data 
set contains time-course measurements over voxels, the number of which is 
typically of the order of 10 4 -10 5 , the conventional false discovery rate (FDR) 
approach [Benjamini and Hochberg (1995), Storey (2002)] can be adopted 
to account for the multiple comparisons problem. Other useful and elabo- 
rate procedures for covariance matrix estimation and multiple comparison 
may also be employed. Particularly, Zhang et al. (2006) presented numerical 
evidence that the existing FDR approach tends to find activation in tiny 
scattered regions of the brain which are more likely to be false discoveries, 
and carefully devised a new FDR approach which gains efficiency over the 
existing FDR approach. 

4.2. Asymptotic power functions. To appreciate the discriminating power 
of the proposed tests in assessing the significance of activated areas, the 
asymptotic power is analyzed. Theorem 4.3 demonstrates that both K and 
Kbc are consistent against all fixed deviations from the null model. 

Theorem 4.3. Assume Condition A in the Appendix and n~ 1 S T R~~ 1 S -^>M, 
where M is positive definite. Then, under the fixed alternative Hi in (4-1); 

n^K Z (AhfiAM^A^Ah/a 2 > 0, 

n -1 K bc £ {AhfiAM^A^Ah/a 2 > 0. 
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The results in Theorem 4.3 indicate that under the fixed alternative H\, 

V V 
K— >+oo and K.b c — >+oo, 

at the common rate n. Hence, the test statistics IK and Kb c have power 
functions tending to one against fixed alternatives. 
Consider a sequence of local alternatives, defined by 

(4.2) H ln :Ah = 5 n c, 

where 5 n = n" 1 ! 2 and c = (ci, . . . , c\ i ) T ^ 0. Theorem 4.4 explores the asymp- 
totic distributions of K and Kb c under the local alternatives H\ n . 

Theorem 4.4. Assume Condition A in the Appendix and n~ l S T R -1 x 

~ -p 

S^M, where M is positive definite. Then, under the local alternative H± n 

in {4-2), K^XkiT 2 ) an d ^bc - > Xk( r2 )> with noncentrality parameter r 2 = 
c T (AM~ 1 A T )~ 1 c/a 2 . 

The results in Theorem 4.4 indicate that the tests have nontrivial local 
power detecting local alternatives approaching the null at the rate ra" 1 / 2 . 
A simple calculation shows that the asymptotic power of the tests against 
local misspecification (4.2) equals 

00 exp{-(x + r 2 )/2} ~ x k/2+j-i r 2j 

xi^ a 2 k/2 ^r(k/2+jW x ' 

where Xk-i-a ^ s * ne 1 — Q quantile of the x 2 distribution and T(-) denotes 
the gamma function. 

5. Simulation study. Throughout the numerical work, we use the Epanech- 
nikov kernel function [Silverman (1986)] supported on [—1,1]. A complete 
copy of MATLAB codes is available on request. 

5.1. Hypothesis test of HRF at a single voxel. As an illustration, the 
hypothesis testing for Hq : h = versus H\ : h / is undertaken. This is used 
to test whether the brain activity in a voxel is triggered or not. To check the 
agreement between the x 2 distribution with finite sampling distributions of 
K and ICbc under Hq, the fMRI data are simulated as follows. We simulate 
an fMRI experiment with a single run and a single type of stimulus, where 
n = 400 and 500 realizations are conducted. (I) The time- varying stimuli are 
generated from independent Bernoulli trials such that P{s(ti) = 1} = 0.5. 
(II) The HRF is h(U) = 0, i = 1, . . . , 18 (so that m = 18). (Ill) The drift 
function is d{ti) = 10sin{7r(£j — 0.21)}, i = 1, . . . ,n. (IV) The noise process e 
is the sum of independent noise processes e\ and e<i (see Purdon et al. (2001)); 
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i e i(ti)} are i.i.d. normal with mean zero and variances 0.5216 2 , 0.3689 2 , 
0.2608 2 and 0.1844 2 , respectively; e 2 is AR(1), that is, e 2 (ij) = /oe 2 (t»-i) + 
z(ti) with p = 0.638 and the z(ti) follow the normal distribution with mean 
zero and variances 0.5216 2 , 0.3689 2 , 0.2608 2 and 0.1844 2 , respectively. These 
choices give a noise lag-one autocorrelation equal to 0.4 and signal-to-noise- 
ratios (SNRs) of about 1, 2, 4 and 8, where SNR = variance(Sh) /variance(e). 

The QQ plots of the (1st to 99th) percentiles of IK and IKb c against those 
of the Xm distribution are displayed in Figure 2. In the top panel, IK and 
use the true covariance matrix and fix the smoothing parameters at 
their theoretically optimal values (minimizing the mean squared errors of 
estimators) for estimating the HRF and drift in each simulation. For the 
sake of clarity, only the cases of SNR equal to 1 and 8 are presented; the 
former is the "large noise level" case, whereas the latter is the "small noise 
level" case. In either case, we observe that the finite sampling distributions 
of IK and Kb c , at the realistic sample size 400, agree reasonably well with 
the x 2 distribution. The QQ plots also lend support to the possibility that 
Kbc is better than or at least as good as, the bias-uncorrected counterpart 
K. 

For a more realistic comparison, IK and IKb c in the bottom panel of Figure 
2 use the estimated covariance matrices and data-driven smoothing parame- 
ters. The results are similar in spirit to the ones in the top panel and continue 
to support the bias correction procedure. 



SNR=1. large noise levet SNFt=B. small naso level 




10 15 20 25 30 35 10 IS 20 25 30 35 



SNR=1. large noise level SNR=B, small nrise level 




10 15 20 2 5 30 35 1 15 20 2 5 30 35 



Fig. 2. Empirical quantiles (on the y-axis) of test statistics K and Kt, c (where the top 
panel uses the true R and the optimal smoothing parameters, and the bottom panel uses 
the estimated R and data-driven smoothing parameters) versus quantiles (on the x-axis) 
of Xm distribution. Solid line: the 45 degree reference line. 
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Fig. 3. True activated brain regions (denoted by hot color) for the simulated fMRI data 
set. 

5.2. Detection of activated brain regions. We simulate a whole brain 
fMRI data set, with aim of mimicking true brain activity to the maximum 
extent feasible. The experiment design, timings and size are exactly the same 
as those of the real fMRI data set in Section 6. An HRF profile is extracted 
from a voxel which shows the strongest responses in the real data set. For 
each voxel, the simulated drift is obtained from an adequate smoothing of 
the time series for the corresponding voxel of the real data set. The sim- 
ulated noise variance profile is determined from a variance map, which is 
made by a 5 x 5 x 5 spatial median smoothing on median values of squared 
residuals of the real time series, subtracting the simulated drift profile as 
mentioned before. The noise process e(t) is generated in a fashion similar to 
that of Section 5.1. Specifically, the variances of ei(i) and z(t) are chosen 
to be equal such that var{e(i)} is one-fifth of the variance map. The HRF 
profiles, in accordance with the stimuli in the experiment, is added to two 
regions which are postulated to be truly active. In these two zones, the HRFs 
have been rescaled to about 17% and 12% of the amplitude of the original 
HRF profiles. The purpose of rescaling the HRFs and noise variance is to 
amplify the drift effect and weaken the HRF response so that the estimation 
of the HRF is more challenging. Figure 3 shows nine different slices which 
highlight the two activated brain regions. Note that throughout the paper, 
we apply the same registration transform from the real brain data to the Tl 
high-resolution image of the subject's brain. 

The gain in efficiency achieved by the semiparametric inference procedure 
is illustrated by comparing the activated brain regions identified by our ap- 
proach with those identified via methods offered by the popular software 
AFNI and FSL. The conventional FDR approach is performed at the FDR 
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Fig. 4. Comparison of activated brain regions discovered for the simulated fMRI data 
set. Top panel: K (on the left) and Kb c (on the right). Bottom panel: AFNI (on the left) 
and FSL (on the right). The conventional FDR approach is used. The FDR level is 0.05. 



level 0.05. Inspection of Figure 4 reveals that K and Kt, c are capable of lo- 
cating both active regions. In contrast, both AFNI and FSL fail to locate an 
activated brain area, and the other region, although correctly detected, has 
appreciably reduced size relative to the actual size. This detection bias sug- 
gests that the stringent modeling assumptions in Table 1 should be relaxed 
to ameliorate the effects of misspecification. Furthermore, as evidenced in 
Figure 5, all four methods, when applying the new FDR approach in Zhang 
et al. (2006), achieve more accurate detection than their counterparts in Fig- 
ure 4, with IK and Kt, c continuing to outperform AFNI and FSL. Therefore, 
for applications to the real fMRI data set in Section 6, we will only employ 
the new FDR approach in Zhang et al. (2006). 



6. Real data analysis. In an emotional control study, subjects saw a 
series of negative or positive emotional images and were asked to either 
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suppress or enhance their emotional responses to the image, or to sim- 
ply attend to the image. Therefore, there were six types of trial (i.e., six 
types of stimuli): negative-enhance (neg-enh), negative-attend (neg-att), 
negative-suppress (neg-sup) , positive-enhance (pos-enh) , positive-attend 
(pos-att) and positive-suppress (pos-sup) . The sequence of trials was ran- 
domized. The time between successive trials also varied. There were 24 trials 
each of neg-enh, neg-sup, pos-enh, and pos-sup; there were 11 trials each 
of neg-att and pos-att. 

The size of the whole brain data set is 64 x 64 x 30. At each voxel, the time 
series has six runs, each containing 185 observations with a time resolution 
of two seconds, thus TR = 2 seconds and the total length is 1110. In contrast, 
the length of stimuli is 2220; the timing of the stimuli has a time resolution 
of one second and thus each HRF output will also be sampled at one second. 
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Fig. 5. Comparison of activated brain regions discovered for the simulated fMRI dataset. 
Top panel: WL (on the left) and Kb c (on the right). Bottom panel: AFNI (on the left) and 
FSL (on the right). The new FDR approach in Zhang et al. (2006) is used. The FDR level 
is 0.05. 
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Hence, the odd rows of the design matrix S in (3.2) suffice for analysis. The 
study aims to estimate the BOLD response to each of the trial types for 1-18 
seconds following the image onset. We analyze the fMRI data set containing 
one subject. The length of the estimated HRF is set equal to 18. 

A comparison of the activated brain regions detected by K, Kbo AFNI 
and FSL is illustrated in Figure 6. Again, the HRF in FSL is specified as the 
difference of two gamma functions and the drift term in AFNI is specified 
as a quadratic polynomial. We use FDR at level 0.001 to carry out the 
multiple comparisons. This level is set to avoid excessive discoveries, most 
of which are thought to be false. Our detected regions are closer to those 
obtained by AFNI, but our methods find activation in much more clustered 
regions of the brain. For example, our results do not have the holes seen in 
the detected regions on the first slice of AFNI and FSL. AFNI gives more 
tiny scattered findings, which are more likely to be false discoveries. FSL 



mm 






1 J| 






I Jl I 












K 1 


Metal »■ 













Fig. 6. Comparison of activated brain regions discovered for the real fMRI data set. Top 
panel: K (on the left) and Kb c (on the right). Bottom panel: AFNI (on the left) and FSL 
(on the right). The new FDR approach in Zhang et al. (2006) is used. The FDR level is 
0.001. 
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detects very scattered regions which are difficult to interpret. In addition, 
the volumes of the regions detected by FSL are substantially smaller than 
those of AFNI and our methods. 

APPENDIX: PROOFS OF MAIN RESULTS 

We first impose some technical assumptions, which are not the weakest 
possible. Throughout the proof, C is used as a generic finite constant. 

Condition A. 

Al. The drift function d(t) has a bounded continuous second derivative. 

A2. The kernel K is a symmetric probability density function with com- 
pact support, say [—L,L], is Lipschitz continuous and such that sup f K(t) < 
C for some constant C G (0,oo). 

A3. Assume that {e(ti)} is a stationary ^-dependent sequence with E{e(t\)} = 
0, E{e 2 G (0,oo)(ti)} = a 2 and E{e 4 (ti)} < oo. The eigenvalues of R, the 
true correlation matrix of e, are uniformly bounded away from zero and 
infinity. Furthermore, E{e(ti)e(tj)\R} = E{e(ti)e(tj)}. 

A4. In model (3.1), j = 1, . . . , r, are independent of {e(-)}. For the 

RPER design, Sj(t) is stationary and P{sj(t) = 1} = pj G (0, 1), j = 1, . . . ,r, 
and Y%=iPj < 1- Assume that Sj^tu) and Sj 2 (t v ) are independent at any 
t u ^t v . For any u,v = l,...,n, E{sj(t u )sj(t v )\R} = E{sj(t u )sj(t v )} . 

A5. n — > oo, b — > and nb — > oo. 

A6. ti = i/n, i = 1, . . . ,n. 

A7. cov(S T ,S T ) >0. 

A8. J E(|| J R- 1 - J R" 1 ||^ ) = o(l). 

We next introduce some necessary notation and definitions. 

Notation. For the kernel K and bandwidth b > 0, define K b (t) = K(t/b) /b. 
Denote by ej the jth column of an identity matrix. Define vectors 1 = 
(1, . . . , 1) and 0= (0, ...,0) T . Define a matrix H with entries H(i,j) = 
n^Kfritj — ti), 1 < i, j <n. Define V = and let p(l) denote the noise 
autocorrelation coefficient. Denote by £^ the £th column vector of Sj, that 
is, £j£ = Sjeg. Throughout the proof, || • || refers to the L2-norm unless oth- 
erwise stated. 

Definition A. 1. An n x n matrix B is called "row absolute value 
uniformly summable" (RAVUS) if there exists C > such that 

n 

sup sup ^2\B(i,j)\ <C. 

n>l l<j<n 
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Likewise, B is called "column absolute value uniformly summable" (CAVUS) 
if there exists C > such that sup n>1 sup 1<i<n 

E"=i \B(i,j)\ < C Moreover, 
if a matrix is both RAVUS and CAVUS, it is called "absolute value uniformly 
summable" (AVUS). 

Before proving the main results of the paper, we need Lemmas A.1-A.8. 

Lemma A. 1. // both matrices B 1 G R nxn and B 2 £ R nxn are AVUS, 
where AVUS is defined in Definition A.l above, then B\B 2 is AVUS. 

Proof. By the definition, there exists C > such that 

n n 

sup sup ^2\Bi(i,j)\<C, sup sup ^\B 2 {i,j)\<C. 

n>ll<j<n^_^ n>l l<j<n --^ 

We observe that 

n n n 

sup sup ^2\(BiB 2 )(i,j)\ <sup sup ^2^2\B 1 (i,l)B 2 (l,j)\ 

n n 

= sup sup ^|£ 2 (Z,j)|^|£i(i,Z)| 

n>ll<j<n l=1 i=l 
n 

<Csup sup J2\MIJ)\<C 2 . 

n>l 1<j <n 

Thus, B\B 2 is RAVUS and, by similar reasoning, is CAVUS. Hence, B\B 2 is 
AVUS. □ 

Lemma A. 2. Assume Condition A3. There then exist constants C E 
(0,oo) and A G (0,1) suc/i toot |V(i,j)| < CA^'I /or a/2 1 < i,j < n and 
n>l. 

Proof. Under Condition A3, R is positive definite, centered and 2g- 
banded. Let a n and b n be the minimum and maximum eigenvalues of R, 
respectively, and set r n = b n /a n . Applying Proposition 2.2 of Demko, Moss 
and Smith (1984) gives that |V(i,j)| < C n An , where C n = maxja" 1 , (1 + 
rn /2 ) 2 /(2a n r n )} and A n = {(rv/ 2 - l)/(r l J 2 + l)}Vfl. From Condition A3, a n 
and b n are bounded away from both zero and infinity; it follows that r n is 
bounded away from both zero and infinity. Thus, there exist C G (0, oo) and 

A G (0, 1) such that C n <C and A n < A. Hence, \V{i,j)\ < C n X l t jl < CA^'l 
□ 
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Lemma A. 3. Assume Conditions A2 and A5. 

1. Let f be Lipschitz continuous and bounded on an interval [d\,d 2 ], where 
d\ < d 2 . Let Uj = d\ + (d 2 — d\)j/n, j = 1, . . . , n. Then, uniformly in t, 

(A.1) -rj/fc).—^ Ki( „_ T , /M(i!l + (_ 

2. Let {e J }|S =1 5e a sequence of g-dependent and identically distributed 
random variables. Assume E(ef) < oo. Then, for tj = j/n, j = 1, . . . , n, 



(A.2) sup E 

te[bL,l-bL] 



)1 



'£e j K b (t j -t)-E(e 1 ) 



3=1 



o 



1 

nb 



Proof. We first show (A.l). By the assumptions, there exists a constant 
C > such that \K(s) - K(t)\ < C\s - t\, K(s) < C, |/(s) - f(t)\ < C\s - t\ 
and |/(s)| < C for any s and t. Define J = {j G Z : n(—bL + T — d\)/(d 2 — d\)< 
j <n(bL + T-d 1 )/(d 2 -di)} = {h,...,l 2 }. Clearly #J < 2nbL/(d 2 - di) +2, 
K b (uj — r) = for any j ^ J and — r) = for u < or u > u; 2+ i. 

It follows that 



1 



n 



J2K b ( Uj -r)f( 



1 



3=1 



d> 



K h (u - T)f{u)du 



1 



d 2 - d\ 



d 2 - di 

E/ K b (uj -T)f(uj)du-^2 K b (u - t) f {u) du 



j'eJ' 



Kb(u-T)f(u) du 



«/., 



< 



do - d 



E 



«* CK--u)|/(^-)| 
6 2 



+ 



C 2 

i^b(u — r)C(uj — u)du \ H - 



< 



nbL 

do - di 



+ 1 



C 2 C 2 \(d 2 -d!) c 



6 2 + & 



n- 



1 

nb 



We now show (A.2). Following (A.l), 



sup E 

t£[bL,l-bL] 



2i 



te[bL 



y^(t r t)-£( £l ) 
n i=i J 

sup var< — ejK b (tj — t) 1 
i,i-6L] 1 n , eJ J 
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+ sup l^-J2K b (h-t)-l) {E(e 1 )} 2 
te[bL,...,l-bL] [ n j=i J 

<g (2 „ 6i + 2W£l) + (^) =0 (_l). n 

Corollary A. 1. Assume Conditions A2, A5 and A6. 

1. For any I = 0, 1, 2, . . . , we have that uniformly in t £ [0, 1], 

1 n r r( 1 ~ t )/ b f 1 

(A.3) -Y KdU-tMU-t) 1 = h [ \ \ K(u)u l du + 0[ — 

n^— [ U-t/6 \nb 

and thus, uniformly in i £ [n6L, . . . , n — nbL], 
(A.4) \fl K b{ti ~ U)(tj - U) 1 = b i y L L K(u)u l du + o(^ 

2. There exists C > such that for all n = 1,2, ... , i £ {1, . . . ,n} and 
je{l,...,n}, 

(A.5) nb\S d (i,j)\<C. 

Moreover, Sd is AVUS. Furthermore, for all n = 1, 2, . . . , i G [n6L, . . . , n — nbL] 
and j £ {1, . . . , n}, 

(A.6) 5 d (i,i)=fl-(*,j)(l + c B ,i), 

w/iere sup n6i <j< n _ n6i |c nji | = 0{l/(nb)}. 

3. Let {ej}'jL 1 be a sequence of g- dependent and identically distributed ran- 
dom variables. Assume E(ei) = and E(e±) < oo. Let Y{ = n~ l Y^j=i e j^b{tj - 
U), e = (ei,...,e n ) T and y = (Y 1 , . . . , Y n ) T . Assume that B £ R nxn is AVUS. 
Then, 

(A.7) n- 1 E(\\Byf)=o(l), 
(A.8) n- 1 E(\\BS d e\\ 2 )=o(l). 

PROOF. Part 1. Following the proof of (A.l), 

i " , ,fu-t\ l 



ig^(%-o(V) -L /b km* 
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<]T / 3 \K b (t,-t)-K b (u-t)\ 



U-t 



du 



E / 3 K h {u-t) 



U -t 



u — t 



du + 



nb 



Note that for j G JJ, we have \(tj — t)/b\ l < L l and that for j G JJ and < 
u < t 3 -, we have \{(tj - t)/b} 1 - {(u - t)/b} l \ < C\(tj - u)/b\. Applying the 
same argument for (A.l) completes the proof for (A. 3) and, in turn, (A. 4). 

We then show Part 2. Define S Uj i(t) = n _1 E"=i K b (tj - t)(tj - t) 1 , I = 
0,1,2, and S n (t) = X(t) T W(t)X(t). Then, 



S n (t) = n 
{Sn(t)}- 



S n ,o(t) S nt i(t) 
S n ,l\t) S nj 2(t) 
1 



n[S n>0 (t)S n , 2 (t) - {Sn^t)}*} 
According to (A. 3), uniformly in t G [0, 1], 



S n ,2{t) —S nj i(i) 
-S n ,i{t) Sn,o(t) 



{Sn(t)} 



-1 



nb 2 [f(t) + 0{l/(nb)}\ 

b 2 [a 2 (t) + 0{l/(nb)}} 
-b[ ai (t) + 0{l/(nb)}] 



-b[ ai (t) + 0{l/(nb)}] 
a (t) + O{l/(nb)} 



where ai(t) = ^ t ^ h K(u)u l du, I = 0,1,2, are all uniformly bounded in t 
and f(t) = a (t)a 2 (t)-{a 1 (t)} 2 is minimized at t = with /(0) = 0.25 var(|J7|) > 
for a random variable U with density K(u). It is seen from the definition 
of Sd in (3.3) that 



Sd(i,j) 



Kbjtj - tj)/n 
f(U) + 0{l/(nb)} 



a 2 (ti) + O 



1 

nb 



a 1 {t i )+0[ — 



Note that when j — nbL, . . . , i + nbL] , K b {tj — tj) = implies Sd(i,j) = 0. 
Also, note that when j G [i — nbL, . .. ,i + nbL], \tj — ti\/b < L. Thus, there 
exists C > uniformly in i G {1, . . . , n} and j € [i - n6L, . . . , i + n6L] such 
that \S d (i,j)\ < Cn' l K b {tj - U) < Csup t K(t)/(nb). Thus, for some C > 0, 
uniformly in n = 1, 2, . . . , i = 1, . . . , n and j = 1, . . . ,n, 



\Sd(i,j)\ 



= 0, 

<C/(nb), 



if | j — i| > nbL, 
if |j — i\ < n6L, 
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which implies (A. 5) and also implies that Sd is AVUS. Similar arguments for 
(A.5) combined with (A.4) and (3.3) yield (A.6). 

We now show Part 3. From (A.2), for any e > 0, there exists N such 
that n > N implies that E(Y 2 ) < e for all i G [nbL, . . . ,n — nbL]. It can 
also be shown that there exists C > such that E(Y 2 ) < C for all i £ 
[1, . . . , nbL] U [n — nbL, . . . , n\. Since B is AVUS, there exists C\ > such that 

SUp n > 1 SUp 1 < i < n Er=l \ B (h3)\ ^ Cl andsu Pn>l su Pl<i<nE"=l \ B (hj)\ < C\. 

The proof of (A. 7) is obtained as follows: 

-. n n n 

n~ l E(\\Byf) = -J2EJ2 B ( i ^) B ^ k ) E ( Y ^) 
n i=ij=ik=i 

1 n n n 

<-EEEi^jW^)i{^( y /)^ 2 )} 1/2 
n i=\ j=\k=i 

-i n ( n 
i=l Kj = l 

+ EE m,miM 

i=l j,k&[l,...,nbL]Vj[n-nbL,...,n] 

< e 1/2 C 1/2 Cf + 2C\CbL + 2C 2 C/n. 

Applying (A.5), (A.6) and similar arguments for (A. 7) completes the proof 
of (A.8). □ 

Lemma A. 4. Let {X„}™ =1 be a sequence of random variables such that 
every subsequence of X n has a further subsequence converging in distribution 
to a same random variable X. Then X n —>X. 

Proof. Let (fry denote the characteristic function of a random variable 
Y. Since any subsequence {ni}f2 =1 of {1,2,...} has a further subsequence, 

{ni j }j? =1 , such that X ni — >X, the Levy-Cramer continuity theorem [Shao 

(2003), page 56] implies that 4>x n (t) — ► 4>x(t) as j — > oo for any ( £ K. 

This, in turn indicates that 4>x n (t) —> 4>x{t) as n — > co for any We 

£ 

then conclude that X n — ► X by repeated application of the Levy-Cramer 
continuity theorem. □ 

Lemma A. 5. Let {ej}™ =1 be a stationary g-dependent sequence 
with E(ei) = and E{e\) < oo. Set x n j = T n ^ei, i = l,...,n, where {T n> i} is 
independent of {ej. Define cr^({r ni i}) = E[(£,i=i Xn,i) 2 \{T n ,i}\ ■ 
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sup n >i sup!<j< n \r n ,i\ < C and ^({tv^}) =na\{\ + o P (l)} for some con- 
stants C > and o\ 6 (0, oo), then a~ x Ya=i x n,i — > iV(0, 1) . 

Proof. The proof follows directly from Theorem 1.7 of Bosq (1998), 
page 36. It can be achieved by applying the blocking arguments and Lya- 
punov's central limit theorem. □ 

Lemma A. 6. Assume model (3.2) and Conditions A1-A7. Then: 

1. all three matrices Vr = V(I — So), Vl = (1 — Sd) T V and V = (I — 
S d ) T V(I-S d ) areAVUS; 

2. var(n~ 1 ^J 1 ^ i y^ 2 ^ 2 ) 0/or allji,j 2 = l,...,r and all 4, 4 = l,...,m; 

3. n-^R^S-Ein-^R-l^^O; 

4. all entries of E(n~ 1 S T R~ 1 S) are bounded; 

5. all convergent subsequences of -E(n -1 S T i? -1 S) are positive definite. 

Proof. The proof of Part 1 can be obtained from applying Lemma A.l, 
Lemma A. 2 and part 2 of Corollary A.l. 
We next show part 2. For £±, i 2 = 1, . . . , m, 

n n 

fc 1= ifc 2 =i 

n n 

= n_1 E s ji( t k l -te 1 )s j2 {tk 2 -ti 2 )V{k 1 ,k 2 ) 

ki=ii k 2 =i 2 
= h,X + Il,2+h,3+Il,A, 

where 

n n 

h,i=n~ l p h p j2 E V(ki,k 2 ), 
k 1 =e 1 k 2 =t 2 

n n 

h,2 = n~ 1 p j2 {sj^t^ -t £l ) -p h } V(h,k 2 ), 
ki=£i k 2 =e 2 

n n 

h,z = n- l p jl ^ { s j2( t k 2 -te 2 )-p j2 } E Vfaki), 

k 2 =£ 2 ki=£i 
n n 

h,A = n ~ l Yj J2 Unfai -tti) ~ Pji}{sj 2 (tk 2 ~k 2 ) -p j2 }V(k 1 ,k 2 ). 

k x =i x k 2 =l 2 

It is easily seen that 

(A.9) var(li,i) = 0, var(Ii )2 ) = 0(n~ l ) and var(/i, 3 ) = 0(n~ l ). 
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For Ii )4 , 

-y n n n n 

= 3 E E E E ii s h (*fci - - Pii }{*»ia (**a - **a) - Pia } 

fcl=*i k 2 =£ 2 k 3 =h k A =l 2 

x {Sji (*fc 3 -tti) ~ Pji}{ s j2( t k 4 ~tt 2 ) ~Pj 2 }} 

xV(h,k 2 )V(k 3 ,k 4 ), 

in which two situations will be discussed. In the situation where j\ = j 2 = j, 
the additive term above is nonzero only in the following four cases: 
I: k l -£ l = k 2 -£ 2 = k 3 -£ 1 = k A -£ 2 - 

II: {k 1 -£ 1 = k 2 -e 2 }^{k 3 -h = k 4 -£ 2 }; 
III: {fci - e l = k 3 - 4} + {k 2 - £ 2 = h - £ 2 }; 
IV: {k l -e 1 = k A -£ 2 }^{k 2 -£ 2 = k 3 -£ l }. 
Thus, E{ll A ) = Ei + En + Em + E w , where 

n 

Ei < n- 2 bi(l - P;){p? + (1 - P,) 3 }] E {^(^i-fci + - ^i)} 2 = 0(n~ 2 ), 

ki=l 

n n 

^ii<n- 2 [^(l- Pj )] 2 E E \V(kiM+t2-ti)\ -\V(k 3 ,k 3 + £ 2 -£x)\ 

fcl=l fc3=l 

= 0(n~ 2 ), 

n n 

E m <n- 2 [p,(l-p,)] 2 E J2{V(ki,k 2 )} 2 = 0(n- 2 ), 

fc 1= ifc 2 =i 

EwKn^lpjil-pj)) 2 E E 1^1^2)1- 1^(^2-4 + 4^1 +4 
fc 1= ifc 2 =i 

= 0(n" 2 ). 

Hence, E(I 2 ^) = 0(n~ 2 ) when ji = j2- In the situation where ji / j'2, since 
the Sj 1 (-) are independent at different time points and, similarly, the Sj 2 (-) 
are independent at different time points, E[{sj 1 (tk 1 — %) — Pn}{sn (ife 3 — 
~ PjAishitki ~te 2 ) - Pj 2 }{s h (t k4 - t i2 ) -p j2 }] is nonzero only if fci = fc 3 
and k 2 = k/±. In this case, 

^(A 2 4) 

-y n n n n 

= 3 E E E E E \^ s h (*fci - - Pji (*Ais - *4) - Pn } 

k 1 =£ 1 k 3 =hk 2 =£2k A =h 

X {s j2 (t fc2 - t fe ) -Pj a }{Sj a (tfc4 - */ a ) -Pj a }] 



24 



C. ZHANG AND T. YU 



xV(h,k 2 )V(k 3 ,k 4 ) 

Y n n 

~2 E E ^[{ s ii(*fei - Ui) -Ph} 2 {sj2(tk 2 - te 2 )-Pj 2 } 2 } 



ki=h k 2 =l 2 



n 



n n 

! CEE {V(h,k 2 )} 2 = 0(n 

ki=h k 2 =£ 2 



Thus, in both situations, var(2i4) — > 0. This, combined with (A. 9), yields 
Part 2. 

We then show Part 3. Recall that S = (I — Sd)S = [Si,...,S r ], where 



(I - S d )Sj. Then, 



n 



-1ST 



n 



T -r Sl 



ST R ^Si 



S~, R, Si- 



It suffices to consider the block matrix n 1 Sj 1 R l Sj 2 , whose (£i,^2)th entry 

is c hA;h,h = n ~ 1 t'ji,tiV£j3S2- B y Part 2 ' WCjiAyVz) °) which, in 

■p 

turn, gives Cj 1 ^ 1 -j 2 ^ 2 — E(Cj lt £ ll j 2 ^ 2 ) — >0 and the conclusion of Part 3. 
We now show Part 4, which can easily be derived from 



n 



"^Jia^aI^*" 1 E E im^)l<c 

fc 1= lfe 2 =l 



since the entries of Sj are either or 1. 



Last, we show Part 5. For any {ra^}^^ such that E(n, S i? S) con- 
verges to some limit M, by Part 3, n^" 1 S T i?~ 1 S -S> M. Obviously, M is semi- 
positive definite. It remains to show that M is nonsingular. We now prove 
this by contradiction. Assume that there exists some c = (cj , . . . , c^) T £ 
W m , where Cj = (c^i, . . . , Cj, m ) T , j = 1, . . . , r, such that c / and c T Mc = 

0. Then, n'^ 1 c T S T R~ 1 Sc-^0. By the Schur decomposition, there exist Q 
and v\,... ,v nk such that V = Q T diag(-yi, . . . ,v nk )Q, where Q T Q = I nk and 
Vn k < • • • < v±. Furthermore, from Condition A3, V is positive definite with 
eigenvalues bounded away from and oo. Thus, there exist constants a and 
b such that < a < b < oo and < a < v nk < • • • < v\ < b < oo. Noting that 

2 



■ 1 c T S T J R" 1 Sc = c T diag(^i 



j • • • j v n k 



)c > a c 



where c = QSc/ \/n^, we conclude that as k — > oo, 



(A.10) 



(I-5 d )Scf/n fc ^0. 
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Consider 

||(I - S d )Sc\\ 2 /n = ||{Sc - E(S)c} - {S d Sc - E(S)c}\\ 2 /n 

(A.ll) 

= >h — 2J2 + J3, 

where Ji = ||Sc - E(S)c\\ 2 /n, J 2 = {Sc - E(S)c} T {S d Sc - E(S)c}/n and 
J3 = \\SdSc — E(S)c\\ 2 /n. By the law of large numbers and block arguments, 
we can show that 



(A.12) Jx^var^cJs^J > " X>iJ (£pjIM 2 J >0, 
where s^j = Sjej, i = 1, . . . , m. For J3, note that 



£(J 3 )<o(l) + ^££ 



2 n 



r m ( n 



i=i Lj=ifc=i U=i 

II II 2 r m 

< o(l) + VVfn- 2n6L) 
j=ifc=i 



x sup E 

i€ [n6L,...,n— n&L] 



2~l 



^2S d (i,£)sj(te-tk) -pj 



|2 r m 



i=ifc=i 



x sup E 1 

i£[l,...,nbi]u[n— nbL,...,n] 
11 ||2 r 



^2S d (i,£)sj(t t -t k ) - pj 



j=l k=i 



x sup E 

i6 [nbL,. . .,n—nbL] 



1 " 

{1 + 0(1)}- e A 6(*« - - **) - pj 



1 11 2 r m 

+ }±£Lj2Y, 2nbL 

71 j=lk=\ 



x sup E 

iS[l,...,nbL]U[n—nbL,...,n] 



2-1 



^2s d (i,l)sj(te - t k ) - pj 



^•=ifc=i 



,i=ifc=i 
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In the last equality, the second term follows from (A.2), whereas the third 
term uses \sj(t£ — tk)\ < 1 and the fact that Sd is AVUS. Thus, E(J^) — > 
and J3 > imply that 

(A.13) h = o P (l). 

By the Cauchy-Schwarz inequality, | J 2 | < 2(Ji J3) 1 / 2 . Thus, J 2 = op(l). 
This, together with (A.ll), (A. 12) and (A.13), shows that S d )Sc\\ 2 /n^> 
var (X^=i c J s j,m), which contradicts (A. 10). □ 

Lemma A. 7. Assume Condition A. Suppose that n _1 S T i? _1 S —> M, 



h^A^o^M- 1 ). 



positive definite. Thenn 1 S T R 1 S— >M andn 1 / 2 ^ 



PROOF. From (3.5), h- h =Jn- 1 S T fi- 1 S)- 1 ?i- 1 / 2 ( J£+ J^) where Jf = 
n-V^R^d and J 2 * = n- l ' 2 S T R^e. Let Ji = n" 1 / 2 ^"^ and J 2 = 
n -i/2gT^-i~ rp^g proof proceeds in three steps to show that J\ = op(l), 



J 2 ^A(0,cr 2 M) and 
(A.14) 



n-^iR^-R-^S^O, 



J* -J 1 = 0p (i), 
J* -J 2 = 0P (1). 

First, we will show J\ = op(l). From (A. 6), the ith entry of (I — Sd)d is 



d(U) - {1 + o(l)}~ E " *<) 

■(l-tj)/6 



i=i 

d(ti)-{l + o(l)} 



K{u) d(U + u6) cin + O 



1 



= {1 + o(l)} [ L {ubK(u)d'(ti) + uViffuld" (£)/2} du + o(l) + O ( 4 

= 0(1), 

uniformly in i 6 [n&L, . . . , n — n6L] . When i 6 [1, . . . , n6L] U [n — n6L, . . . , n] , 

1 " 

- {1 + 0(1)}- £ - *i)d(^; 



< sup 1 + sup 



te[o,i] 
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(A.15) 



sup |ej d| = o(l), 

id[nbL,...,n— nbL] 

sup sup \e[ d| < C. 

n>l ie[l,...,nbL]U{n-nbL,...,n] 



Consider the jth block vector of J\\ Jij = n 1 ' 2 SjR 1 d. Its ith entry is 
ef J 1}j = n-^Vil - S d )(S jti - Pj l). Then, 

EiefJu) = n-WdFVQ. - S d ){ Pj (0tn Ci+if " PA) 

v^y(i-^)(if_ 1)0 ^ i+1 ) T 



- Pj n 



i-l 



k=l t=l 

afldthusl^efj^)!^^ 

by (A.15) and the fact that V R is AVUS. Moreover, 

"0 



var(ef Jij ) = n" 1 d T y(I - S, 



Pj (l- Pj )l 

< 71-^(1 - Pj )\\V L d\\ 2 , 



(i - s d y vd 



which implies that var( J\j) = l m l^o(l), using similar derivations for (A. 7). 
Thus, J\j — > and hence J\ — > 0. 

Second, we will show that J 2 N(0, <7 2 M). Since e = e — S d e, J 2 = n~ x l 2 x 
S T i? _1 e — n~ l / 2 S T R -1 S d e = J21 — ^22- Consider the jth block vector of J22: 



J 22 j = n-^SjR^Sde. Its ith entry is efj 22J = n" 1 /^^)^^ - S d )£ 



thus 



3,i> 



E(efj 22J ) = E[E{ n - 1 / 2 (S d e) T V(l - \, = 



and 



var(ef J 2 2,j) = var{£(e/ J 22 j\e)} + E{var(ej J 22 j\e)} 



n 



-1„2 



$ var{(5 d 6) J V(I - ^(l^, 0l_ t+i y } 



+ n- 1 i?((^e) T y(I-5 d ) 





Pj (i- Pj )i 



(I-S d ) T V(S d e)} 



<n-V^\\R^S T d V(l-S d )(ll^l_ t 

+ n - 1 p J (l- Pl )E(\\V L S d e\\ 2 ) 
= o(l) +o(l) = o(l). 



\T\\2 



+1' 
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In the last equality, the first o(l) is from Lemma A. 2 and similar arguments 
for (A. 8). The second o(l) is from (A. 8) and Part 1 of Lemma A. 6. Thus, 
J22J = op(l) and J22 = op(l). For J21, by the Cramer-Wold device, it suf- 
fices to show that for any w = (wj, wJ) T G W rm , w T J 2 i ^ N(0, cj 2 w t Mw), 
where Wj = (wj tl , . . . ,w j>m ) T , j = l,...,r. Note that w T J 21 = n~ 1 / 2 T2=x T n,i^i^ 
where V = Ej=iEZ=i Efc=i Vjfe - ^^(fe,*). Thus, 

Wni\ < rm\ max max \wj A ) \VL(k,i)\ < C, 

\l<j<rl<e<m J ' J f—'' 

where the last inequality is from Part 1 of Lemma A. 6. Also, &n({ T n,i}) = 
nvar(w T J 21 |{r nii }) = a 2 w T S T R~ l Sw = na 2 w T Mw{l + o P (l)} = na 2 a {l + 
op(l)}, where a 2 = o" 2 w T Mw. By Lemma A. 5, the result follows. 
Third, to verify (A. 14), it is sufficient to show that 

(A.16) n" 1 S T ( J R" 1 - J R~ 1 )S = o P (l), 

(A.17) n- 1 / 2 S T (^" 1 -^ 1 )d = o P (l), 

(A.18) n-VWiR- 1 - R- 1 )I = o P (l). 

Note that Condition A8 implies that R" 1 — R" 1 is AVUS. Similar arguments 
for Lemma A. 6, J\ and J 2 complete the proofs for (A.16), (A.17) and (A.18), 
respectively. □ 

Corollary A. 2. Assume Condition A. Then, 

1. h^h; 

2. (Ah - ^lh) T {yl(S T J R- 1 S)- 1 A T }- 1 ( J 4h - Ah) a 2 x 2 k - 

Proof. By Lemma A. 6, for any subsequence {ni}?^_ 1 , there exists a 
further subsequence, {n^jj^i, such that n i ~ 1 S" r i2~ 1 S —> M; for some posi- 
tive definite matrix Mj. For this {ni^JL-^, an appeal to Lemma A. 7 gives 

nl 1 S T R- 1 S^M l and n} /2 (h - h) h N(0, cr 2 Mf 1 ). 

^ P 

It follows that along {ni j }'jL 1 , h— > h as j — ► 00. Thus, for any subsequence 

of h, there exists a further subsequence along which h^h. This gives h— >h 
as n — > 00. 

We now show the second part. Applying Slutsky's theorem gives that 
as j — ► 00, {yl(S" ir i?~ 1 S)~ 1 A- r } _1 / 2 A(h — h) has an asymptotic Gaussian 
distribution with mean vector zero and variance-covariance matrix cr 2 I|<, 
which implies that, for {n; j .}^? =1 , 

(h-h) T A T {A(S T R- 1 Sr 1 A T }- 1 A(h-h)^a 2 xl as j^oo. 
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Applying Lemma A.4, we deduce that (h- h) T A T {A(S T R- 1 S)- 1 A T }- 1 A(h- 
h) — > <7 2 Xk as n ~^ 00 • O 

Lemma A. 8. Assume Condition A. Then r T R^r / n a 2 . 

Proof. By the definition of r, 
(A.19) r = y — Sh = (I — S d )(y - Sh) = S(h - h) + d + e. 
Notice that n _1 r T ii _1 r = n _1 f T -R _1 r + ri~ 1 r r (i?~ 1 — i? _1 )r, in which 
n- l T T R- l v = rx- 1 ||i?- 1 / 2 {S(h - h) + d + e}f 

= n _1 ||i2- 1/2 {S(h - fi) + d - S d e] + iT 1/2 e|| 2 
= h + 2I 2 + h, 

where h = n'^R' 1 / 2 {S(h - h) + d - 5 d e}|| 2 , J 2 = n _1 {S(h - h) + d - 
Sde} T R~ 1 e and ^3 = n _1 ||i? _1//2 e|| 2 . The proof will be completed by show- 
ing that h = o P (l), I 2 = o P (l), h = a 2 +o P (l) and n- 1 r T {Rr 1 -R- 1 )r = 
o P (l). 

First, consider I\. Note that 
h=n- 1 \\R- 1 / 2 {S(h-h) + d-S d e}f 

< 3^- 1 || J R~ 1 / 2 S(h - h)|| 2 + Zn^WR-^df + Sn^WR-^Sdef. 

The first term is op(l) by Lemma A. 6 and Corollary A. 2, the second and 
third terms are both op(l) by (A. 8), (A. 15) and similar derivations for (A. 7). 
Thus, h = o P (l). 

Second, consider ^3 = n~ x Ya=i Y%=i e (U)t{tj)V(i, j). Then, 

E(I 3 ) = n' l E{e T R' l e) = n~Hrace{-E(ee T )ir~ 1 } = a 2 , 

n n n n 

E(I 2 )=n- 2 Y / E E E^^'^^^)^-^)' 

fcl=l fc2 = l kg = l /C4 = l 

where e(k\, k 2 , k%, k^) = £ , {e(i/ Cl )e(i/ C2 )e(tfc 3 )e(ifc 4 )}. There are only four pos- 
sible cases in which e(k±, k 2 , k%, k&) is nonzero. In Case 1, for any i £ {k\,k 2 , 
ks, k{\, there exists j E {ki,k 2 ,k 3 , k{\, i 7^ j, such that \i — j\ < g. Then, 

n " 2 E E E E e ( fc i< fe 2, fc, fc4)^(A:i , fca)V (Afc, fc 4 ) 

Case 1 
n 

<-" 2 E E E E £[{^i)} 4 ] 

fci=l k2'.\k2— ki <g k^:\k^—k\\ <g k^: \k±— k\ \<g 

x \V(h,k 2 )\-\V(k 3 ,h)\ 

<n~ 2 C 2 . 
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In Case 2, \k\ — k 2 \ < 9 an d \k 3 — k^\ < g, but for any i G {k\, k 2 } and j G 
{A; 3 ,fc 4 }, > 9- Then, 

n ~ 2 E E E E ^i.fe.fe,^^! , k 2 )V(k 3 ,k 4 ) 

Case 2 

=^ 4 EE/ ? (i A; 2-fcii)^(fci^ 2 )EE- o (i A: 3-A;4i)m^4) 

ki,k2 k3,ki 

-» _ vi;x;i;x;p(ife-tii)^-A4i)%w*3^) 

Case 1 

= {£(/ 3 )} 2 -0(n- 2 ). 

In Case 3, | /^i — 1 < c/ and \k 2 — k±\ < g, but for any i G {ki, k 3 } and j G 
{k 2 , k A }, > g. Then, 

n ~ 2 E E E E e ( fc i < fc 2, fcs, M^Cfci. fe)^(*3, fc 4 ) 

Case 3 

n n 

<^ 4 E E E E p(\k3-h\) P (\h-k 2 \) 

ki=l k3:\k3— fci <g fc2=l &4: &4 — fc2|<9 

X |V(fci,fc 2 )| • 1^(^3,^4)1 

A similar result holds for Case 4, where \ki — k&\ < g and |fe — k 3 \ < g, 
but for any i G {fci, k^} and j G {A^j ^3}) V — j\ > 9- Combining Cases 1-4, 

E(I 2 ) -» {£(/ 3 )} 2 , which leads to I 3 ^E{I 3 ) = a 2 . 

Third, consider I 2 . By the Cauchy-Schwarz inequality, I 2 = op(l). 

Fourth, to deduce ri _1 r T (i? _1 — i? _1 )r = op(l), it is sufficient to show 
that 

(A.20) n _1 {S(h - h)} T {R^ - iT 1 )!^ - h) = o P (l), 

(A.21) n^ 1 d T (i?^ 1 -i?- 1 )d = o P (l), 

(A.22) fT^CR" 1 - J R" 1 )? = o P (l). 

It is easy to see that (A.20) follows from (A. 16) and Corollary A. 2, whereas 
(A.21)-(A.22) are obtained by similar arguments for (A.16)-(A.18). □ 

Lemma A. 9. Assume Condition A. Then: 

1. n~ l / 2 S T R~ l d = op{l); 

2. (Ah bc - Ah) T {A(S T R- 1 S)- 1 A T }- 1 (Ah hc - Ah)^a 2 X 2 k ; 

3. n^rlR-^ hc ^a 2 . 
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Proof. To show the first part, note that d = (I — S d )d = (I — Sd)Sd(y — 
Sh) = S d (I - S d )(y - Sh) = S d r. Thus from (A.19), 

(A.23) d = (I - S d )S d S(h - £) + S d d + (I - S d )S d e, 

in which 

(I - S d )S d ^ e = (I - S d )S d [{^ e - E{<- jA )} - { Pj l - E(£ jtt )}] 

= (I - S d )[S d {t jtt - E(^ t )} - S^jlj^Ol-j+if}- 

Note that n~ 1 / 2 S T R~ 1 d = h+h+h, where h = n - l / 2 S 7 iT^I- S d )S d S{h- 
h), I 2 = n- 1 / 2 S T R- 1 S d d and / 3 = n - 1 / 2 S T R- 1 (I - S d )S d e. We now show 
that each term is op(l). For I\, from Lemma A. 7, we have n 1 / 2 (h — h) = 
Op(l), thus we only need to consider the matrix n _1 S T i? _1 (I — S d )S d S. For 
its block matrix n~ 1 Sj i R~ 1 (I — Sd)SdSj 2 , the (^i,£2)th entry satisfies 

\n~ x e^S^R-\l- S d )S d S j2 e e2 \ 
</n(/i2 + /i 3 ), 

where J u = n^p-i/as^ || , j 12 = n" 1 ^^-!^ _ S d )S d {H h/2 - 
E(£ j2jt2 )}\\ an dJ 13 = n-V2||i ? -i/2 (I _5 d) ^ (p . 2l T_ i)0 T_. 2+i) T|| Thenby 

Lemma A. 6, In = Op(l). By (A. 8), I12 = o(l) and, similarly, I13 = o(l). 
Thus l\ = op(l). For I2, using the same procedures as in Lemma A. 7 for 
proving J\ = op(l), we can show I2 = op(l). For I3, using the same proce- 
dures as in Lemma A. 7 for proving J22 = op(l), we obtain I 3 = op(l). Thus, 

n^/^R^d = o P (l). It remains to show that n" 1 / 2 ^^ 1 - R- X )d = 
op(l), whose proof is similar to that of (A. 17). 

To show the second part, recall that hb c = h— (n~ 1 S r i? _1 S)~ 1 (n~ 1 S- r i? _1 d). 
Using the first part together with Lemma 6 and Corollary 2 leads to the sec- 
ond part. 

To show the third part, note that f b c = ?— d and n~ 1 r b " c i? _1 rb c = ra -1 ?^ x 
i? _1 fb c +n~ 1 r^ c (R" 1 — i? _1 )rbc, in which 

n-^R' 1 ^ = n^WR- 1 ' 2 ^ - d)|| 2 =J 1 - 2J 2 + J 3 , 

where Ji = n" 1 ])^- 1 / 2 ? || 2 , J 2 = 2ra- 1 ? T iT 1 d and J 3 = n" 1 || J R- 1 / 2 d|| 2 . From 
Lemma A.8, Ji = a 2 + op(l). From (A.23), 

J 3 = n~ l \\R~ 1/2 {(I - S d )S d S(h - Ii) + S d d + (I - S d )S d e}\\ 2 

^Sn-'iWR-^il - S d )S d S(h-h)\\ 2 

+ WR-^S.dW 2 + ||i?-V2(l _ Sd )S d e\\ 2 }. 
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Using similar proofs for the numerator, we obtain J3 = op(l). By the Cauchy- 
Schwarz inequality, J2 = op(l). Thus, ?7,~ 1 r^ c i?~ 1 rb c o 2 . To show n _1 r^ c x 
(R^ 1 — -R~ 1 )rbc = op(l), it is sufficient to show that 

(A.24) n- 1 f T (.R- 1 - J R- 1 )? = op(l), 

(A.25) rT 1 ^ (RT 1 - i?~ 1 )d = o P (l), 

where (A.24) directly follows from the fourth step of the proof for Lemma 
A. 8 and (A.25) uses similar proofs for (A.21)-(A.22). □ 

A.l. Proof of Theorem 4.1. From Corollary A. 2, under Hq in (4.1), the 
numerator of K converges in distribution to o" 2 Xk- This, combined with 
Lemma A. 8, gives the desired result for IK. 

A. 2. Proof of Theorem 4.2. Under Hq in (4.1), the second and third 
parts of Lemma A. 9 complete the proof for K^ c . 

A. 3. Proof of Theorem 4.3. The numerator of K can be decomposed into 
three additive terms: 

h = (Ah - Ah) T {A(S T R~ 1 Sy 1 A T }~\Ah - Ah); 

1 2 = 2n(Ah) T {A(n~ 1 S T R~ 1 Sy 1 A T }- 1 (Ah - Ah); 

1 3 = n(Ah) T {A(n- 1 S T E- 1 S)- 1 yl T }- 1 (Ah). 

Notice that 1\ -^>cr 2 Xk by the second part of Corollary A. 2; ^3 = n(Ah) T x 
(AM- 1 A T )- 1 Ah{l+o P (l)} by Lemma A.7 and H x in (4.1); I 2 = O p (^/E) 
by the Cauchy-Schwarz inequality. These, along with Lemma A. 8, complete 
the proof for K. The proof for Kb c is similar and is hence omitted. 

A.4. Proof of Theorem 4.4. Following Lemma A.7, under H\ n in (4.2), 
n 1 / 2 Ah-^N(c,a 2 AM~ 1 A T ). Thus 

(vTR-^yi 2 u ; 1 ' 

This completes the proof for K. Similar proofs for HCb c are omitted. 
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